Model Selection

Efficient inference

# Efficient inference

Diffucoder 7B Cpgrpo 8bit

DiffuCoder-7B-cpGRPO-8bit is a code generation model converted to MLX format, based on apple/DiffuCoder-7B-cpGRPO, and is specifically designed to provide developers with an efficient code generation tool.

Large Language Model Other

ERNIE 4.5 21B A3B PT 8bit

ERNIE-4.5-21B-A3B-PT-8bit is an 8-bit quantized version of Baidu's ERNIE-4.5-21B-A3B-PT model, converted to MLX format and suitable for Apple Silicon devices.

Large Language Model Supports Multiple Languages

Huihui Ai.magistral Small 2506 Abliterated GGUF

The Huihui AI Quantized Model is a quantized version of Magistral-Small-2506-abliterated, dedicated to making knowledge accessible to everyone.

Large Language Model

SLANet_plus is a model for table structure recognition that can convert non-editable table images into editable table formats (such as HTML). It plays an important role in the table recognition system and can effectively improve the accuracy and efficiency of table recognition.

Text Recognition Supports Multiple Languages

Qwen.qwen3 Reranker 0.6B GGUF

The quantized version of Qwen3-Reranker-0.6B, dedicated to making knowledge accessible to everyone.

Large Language Model

MiniCPM4-MCP is an open-source edge-side large language model agent model, built on MiniCPM-4 with 8 billion parameters. It can interact with various tools and data resources through MCP to solve a wide range of real-world tasks.

Large Language Model

Transformers Supports Multiple Languages

Gemma 3 27b It Quantized.w4a16

This is a quantized version of google/gemma-3-27b-it, supporting visual-text input and text output. Optimized through weight quantization and activation quantization, it enables efficient inference with vLLM.

Fpham Sydney Overthinker 13b HF GGUF

This project provides optimized GGUF quantized files, which can significantly improve model performance. These quantized files are supported by Featherless AI. Users can run any desired model by paying a small fee.

Large Language Model

featherless-ai-quants

Deepseek R1 0528 GPTQ Int4 Int8Mix Compact

The GPTQ quantized version of the DeepSeek-R1-0528 model, using a quantization scheme of Int4 + selective Int8, which reduces the file size while ensuring the generation quality.

Large Language Model

Qwen2 Audio 7B Instruct I1 GGUF

Weighted/matrix quantized model of Qwen2-Audio-7B-Instruct, supporting English audio-to-text transcription tasks

Transformers English

Deepseek R1 0528 Qwen3 8B MLX 4bit

A large language model developed by DeepSeek AI, optimized with 4-bit quantization, suitable for Apple chip devices.

Large Language Model

lmstudio-community

Deepseek R1 0528 4bit

DeepSeek-R1-0528-4bit is a 4-bit quantized model converted from DeepSeek-R1-0528, optimized for the MLX framework.

Large Language Model

Llm Jp 3.1 1.8b Instruct4

A large language model developed by the National Institute of Informatics in Japan, built on LLM-jp-3, and significantly improved the ability to follow instructions through instruction pre-training technology.

Large Language Model

Transformers Supports Multiple Languages

Llm Jp 3.1 1.8b

LLM-jp-3.1-1.8b is a large language model developed by the National Institute of Informatics in Japan. Based on the LLM-jp-3 series, it incorporates instruction pre-training to enhance the instruction-following ability.

Large Language Model

Transformers Supports Multiple Languages

Dmindai.dmind 1 Mini GGUF

DMind-1-mini is a lightweight text generation model suitable for various natural language processing tasks.

Text Generation

Bytedance Seed.academic Ds 9B GGUF

This project provides a quantized version of academic-ds-9B, aiming to make knowledge accessible to everyone.

Large Language Model

Devstral Small 2505 MLX 4bit

The Devstral-Small-2505 model developed by mistralai, optimized with MLX 4-bit quantization for Apple Silicon devices.

Large Language Model Supports Multiple Languages

lmstudio-community

Facebook KernelLLM GGUF

KernelLLM is a large language model developed by Facebook. This version is quantized using the llama.cpp tool with imatrix, offering multiple quantization options to suit different hardware requirements.

Large Language Model

A M Team AM Thinking V1 GGUF

Llamacpp imatrix quantized version based on a-m-team/AM-Thinking-v1 model, supporting multiple quantization types, suitable for text generation tasks.

Large Language Model

Thedrummer Snowpiercer 15B V1 GGUF

A quantized version based on TheDrummer/Snowpiercer-15B-v1 model, using llama.cpp for quantization, suitable for text generation tasks.

Large Language Model

Mellum 4b Sft Rust GGUF

A large language model fine-tuned specifically for Rust code middle infilling (FIM) tasks, built upon JetBrains/Mellum-4b-base

Large Language Model Supports Multiple Languages

Qwen3 30B A3B 4bit DWQ

This is a 4-bit quantized version based on the Qwen3-30B-A3B model, created through custom DWQ quantization technology distilled from 6-bit to 4-bit, suitable for text generation tasks.

Large Language Model

Qwen3 30B A3B FP8 Dynamic

Qwen3-30B-A3B-FP8-dynamic is an FP8 quantized version of the Qwen3-30B-A3B model, significantly reducing memory requirements and computational costs while maintaining the high accuracy of the original model.

Large Language Model

Qwen3 8B FP8 Dynamic

Qwen3-8B-FP8-dynamic is an optimized version of the Qwen3-8B model through FP8 quantization, significantly reducing GPU memory requirements and disk space usage while maintaining the original model's performance.

Large Language Model

Industry Project V2

An instruction fine-tuned model optimized based on the Mistral architecture, suitable for zero-shot classification tasks

Large Language Model

ZeroWw is a quantized text generation model that uses f16 format for output and embedding tensors, while other tensors use q5_k or q6_k format, resulting in a smaller size with performance comparable to pure f16.

Large Language Model English

A quantized text generation model with output and embedding tensors in f16 format, while other tensors use q5_k or q6_k quantization, resulting in a smaller size with performance comparable to the pure f16 version.

Large Language Model English

Qwen3-8B-Base is the latest generation of Tongyi's large model series, with 8.2 billion parameters and support for 119 languages. It is suitable for a variety of natural language processing tasks.

Large Language Model

Qwen3 0.6B Base Unsloth Bnb 4bit

Qwen3-0.6B-Base is the latest generation of large language models in the Tongyi series. It has a parameter scale of 0.6B, supports 119 languages, and has a context length of up to 32,768 tokens.

Large Language Model

Internvl2 5 1B MNN

A 4-bit quantized version based on InternVL2_5-1B, suitable for text generation and chat scenarios.

Large Language Model English

GLM Z1 32B 0414 4bit

This model is a 4-bit quantized version converted from THUDM/GLM-Z1-32B-0414, suitable for text generation tasks.

Large Language Model Supports Multiple Languages

OPENCLIP SigLIP Tiny 14 Distill SigLIP 400m Cc9m

A lightweight vision-language model based on the SigLIP architecture, extracting knowledge from the larger SigLIP-400m model through distillation techniques, suitable for zero-shot image classification tasks.

Image Classification

Deepseek R1 Quantized.w4a16

INT4 weight-quantized version of DeepSeek-R1, reducing GPU memory and disk space requirements by approximately 50% while maintaining original model performance.

Large Language Model

Falcon E 3B Base

Falcon-E is a 1.58-bit quantized language model developed by TII, featuring a pure Transformer architecture designed for efficient inference

Large Language Model

Bitnet B1.58 2B 4T Gguf

The first open-source, native 1-bit large language model developed by Microsoft Research, with a parameter scale of 2 billion, trained on a corpus of 4 trillion tokens.

Large Language Model English

Bitnet B1.58 2B 4T

The first open-source 2-billion-parameter native 1-bit large language model developed by Microsoft Research, trained on 4 trillion tokens, demonstrating that native 1-bit large language models can significantly improve computational efficiency while maintaining performance comparable to full-precision open-source models of the same scale.

Large Language Model

Transformers English

Bitnet B1.58 2B 4T Bf16

An open-source native 1-bit large language model developed by Microsoft Research, with 2 billion parameters trained on a 4 trillion token corpus, significantly improving computational efficiency.

Large Language Model

Transformers English

Moderncamembert Base

ModernCamemBERT is a French language model pre-trained on a 1T high-quality French text corpus. It is the French version of ModernBERT, focusing on long contexts and efficient inference speed.

Large Language Model

Transformers French

Vit Base16 Fine Tuned Crop Disease Model

This is a transformers model hosted on Hugging Face Hub, with no specific functionality explicitly stated.

Large Language Model

Mtmme Merge Gemma 2 9B NuSLERP W0.7 0.3

A variant of the Gemma-2B model fused using the SLERP method, combining two different weighted versions of the Gemma-2B model

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase